An improved algorithm for unsupervised decomposition of a multi-author document

نویسنده

  • Chris Giannella
چکیده

This paper addresses the problem of unsupervised decomposition of a multi author text document: identifying the sentences that were written by each author assuming the number of authors is unknown. An approach, BayesAD, is developed for solving this problem: apply a Bayesian segmentation algorithm, followed by a segment clustering algorithm. Results are presented from an empirical comparison between BayesAD and AK, a modified version of an approach published by Akiva and Koppel in 2013. BayesAD exhibited greater accuracy than AK in all experiments. However, BayesAD has a parameter that needs to be set and which had a non trivial impact on accuracy. Developing an effective method for eliminating this need would be a fruitful direction for future work. When controlling for topic, the accuracy of BayesAD and AK were, in all but one case, worse than a baseline approach wherein one author was assumed to write all sentences in the input text document. Hence, room for improved solutions exists.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Improved Automatic Clustering Using a Multi-Objective Evolutionary Algorithm With New Validity measure and application to Credit Scoring

In data mining, clustering is one of the important issues for separation and classification with groups like unsupervised data. In this paper, an attempt has been made to improve and optimize the application of clustering heuristic methods such as Genetic, PSO algorithm, Artificial bee colony algorithm, Harmony Search algorithm and Differential Evolution on the unlabeled data of an Iranian bank...

متن کامل

A Novel Intelligent Energy Management Strategy Based on Combination of Multi Methods for a Hybrid Electric Vehicle

Based on the problems caused by today conventional vehicles, much attention has been put on the fuel cell vehicles researches. However, using a fuel cell system is not adequate alone in transportation applications, because the load power profile includes transient that is not compatible with the fuel cell dynamic. To resolve this problem, hybridization of the fuel cell and energy storage device...

متن کامل

Unsupervised Multi-Author Document Decomposition Based on Hidden Markov Model

This paper proposes an unsupervised approach for segmenting a multiauthor document into authorial components. The key novelty is that we utilize the sequential patterns hidden among document elements when determining their authorships. For this purpose, we adopt Hidden Markov Model (HMM) and construct a sequential probabilistic model to capture the dependencies of sequential sentences and their...

متن کامل

Unsupervised Decomposition of a Multi-Author Document Based on Naive-Bayesian Model

This paper proposes a new unsupervised method for decomposing a multi-author document into authorial components. We assume that we do not know anything about the document and the authors, except the number of the authors of that document. The key idea is to exploit the difference in the posterior probability of the Naive-Bayesian model to increase the precision of the clustering assignment and ...

متن کامل

Evaluating the Effectiveness of Integrated Benders Decomposition Algorithm and Epsilon Constraint Method for Multi-Objective Facility Location Problem under Demand Uncertainty

One of the most challenging issues in multi-objective problems is finding Pareto optimal points. This paper describes an algorithm based on Benders Decomposition Algorithm (BDA) which tries to find Pareto solutions. For this aim, a multi-objective facility location allocation model is proposed. In this case, an integrated BDA and epsilon constraint method are proposed and it is shown that how P...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • JASIST

دوره 67  شماره 

صفحات  -

تاریخ انتشار 2016